Original work: https://github.com/dongwookim-ml/almc
Paper: https://arxiv.org/pdf/1608.05921.pdf
To speed up the algorithm, we re-design the updating steps.
We consider:
We might do not need to update posteriors for all entities and relations.
For a sequential updating algorithm (Thompson sampling), it doesn't make sense to use all observed labels in each iteration. i.e.
where $x_t$ is the label observed in $t^{th}$ iteration.
Based on the above consideration, we come up with the following design ideas:
Assume we observe $x_{ijk}$ in $t^{th}$ iteration, we only update the posterior of $e_i, e_j, r_k$ using the new label $x_{ijk}$.
$\textbf{Prior}$:
$$P(\mathbf{e_i}|\sigma_e) = \mathcal{N}(\mathbf{e_i}| \mathbf{u_e}, {\sigma_e}^2 I_D)$$$$P(\mathbf{R_k}|\sigma_r) = \mathcal{MN}(\mathbf{R_k}| \mathbf{u_r}, {\sigma_r} I_D, {\sigma_r} I_D)$$or eqivalently, $$P(\mathbf{r_k}|\sigma_r) = \mathcal{N}(\mathbf{r_k}| \mathbf{u_r}, {\sigma_r}^2 I_{D^2})$$ where $r_k = vec(R_k) \in \mathcal{R}^{D^2,1}$
$\textbf{Likelihood}$:
$$p(x_{ikj}|\mathbf{e_i, e_j}, R_k) = \mathcal{N}(x_{ikj}| \mathbf{e_i}^T R_k \mathbf{e_j}, \sigma_x^2)$$using the identity $\mathbf{e_i}^T R_k \mathbf{e_j} = r_k^T \mathbf{e_i} \otimes \mathbf{e_j}$, $$p(x_{ikj}|\mathbf{e_i, e_j, r_k}) = \mathcal{N}(x_{ikj}| \mathbf{r_k}^T \mathbf{e_i} \otimes \mathbf{e_j}, \sigma_x^2)$$
$\textbf{Entity Posterior}$:
$$P(\mathbf{e_i}|x_{ikj}, \mathbf{e_j}, R_k, \sigma_e) = \mathcal{N}(\mathbf{e_i}| m_{eN}, s_{eN}) \propto P(\mathbf{e_i}|\sigma_e)P(x_{ikj}|\mathbf{e_i, e_j}, R_k) = \mathcal{N}(\mathbf{e_i}| \mathbf{u_e}, {\sigma_e}^2 I_D) \mathcal{N}(x_{ikj}| \mathbf{e_i}^T R_k \mathbf{e_j}, \sigma_x^2)$$We know for $c \mathcal{N}(\mathbf{x|c, C}) = \mathcal{N}(\mathbf{x|a, A})\mathcal{N}(\mathbf{x|b, B})$,
\begin{equation} \mathbf{C = {(A^{-1} + B ^{-1)})}^{-1}} \end{equation}$$\mathbf{c = C(A^{-1}a + B^{-1}b)}$$So the goal is to transform $\mathcal{N}(x_{ikj}| r_k^T \mathbf{e_i} \otimes \mathbf{e_j}, \sigma_x^2)$ into $\mathcal{N}(\mathbf{e_i}| M x_{ikj}, \sigma_x^2 MM^T)$
Assume $R_k \mathbf{e_j}$ is column full rank, $$x_{ikj} = \mathbf{e_i^T}R_k\mathbf{e_j} \Leftrightarrow \mathbf{e_i} = (R_k \mathbf{e_j})^{-T}x_{ikj}$$
Similarly, assum $\mathbf{e_i}^T R_k$ is column full rank, for $P(e_j|x_{ikj}, \mathbf{e_i}, R_k, \sigma_e)$ we have
$$s_{eN} = (\sigma_e^{-2} I_D + \sigma_x^{-2} (\mathbf{e_i}^T R_k)^T(\mathbf{e_i}^T R_k))^{-1}$$$$m_{eN} = s_{eN} (\sigma_e^{-2} \mathbf{u_e} + \sigma_x^{-2} (\mathbf{e_i}^T R_k)^T x_{ikj} )$$$\textbf{Relation Posterior}:$
$$P(\mathbf{r_k}|x_{ikj}, \mathbf{e_i, e_j}, \sigma_r) = \mathcal{N}(\mathbf{r_k}|m_{rN}, s_{rN}) \propto P(\mathbf{r_k|\sigma_r})P(x_{ikj}|\mathbf{e_i, e_j, r_k}) = \mathcal{N}(\mathbf{r_k}| \mathbf{u_r}, {\sigma_r}^2 I_{D^2}) \mathcal{N}(x_{ikj}| \mathbf{r_k}^T \mathbf{e_i} \otimes \mathbf{e_j}, \sigma_x^2)$$Similarly, assume $\mathbf{e_i} \otimes \mathbf{e_j}$ is column full rank,
$$ x_{ikj} = \mathbf{r_k}^T \mathbf{e_i} \otimes \mathbf{e_j} \Leftrightarrow \mathbf{r_k} = (\mathbf{e_i} \otimes \mathbf{e_j}) ^{-T} x_{ikj}$$
In [ ]: